Goto

Collaborating Authors

 virtual loss


Reviews: Online Learning with a Hint

Neural Information Processing Systems

The paper concerns online linear optimization where at each trial, the player, prior to prediction, receives a hint about the loss function. The hint has a form of a unit vector which is weakly correlated with the loss vector (its angle's cosine with loss vector is at least alpha). The paper shows that: - When the set of feasible actions is strongly convex, there exists an algorithm which gets logarithmic regret (in T). The algorithm is obtained by a reduction to the online learning problem with exp-concave losses. The bound is unimprovable in general, as shown in the Lower Bounds section.


Practical Large-Scale Distributed Parallel Monte-Carlo Tree Search Applied to Molecular Design

arXiv.org Artificial Intelligence

It is common practice to use large computational resources to train neural networks, as is known from many examples, such as reinforcement learning applications. However, while massively parallel computing is often used for training models, it is rarely used for searching solutions for combinatorial optimization problems. In this paper, we propose to apply a hash function based distributed parallel Monte-Carlo Tree Search (MCTS) to a real-world problem of molecular design. By running our massively parallel MCTS combined with a simple RNN on 1024 CPU cores for 10 minutes, we achieved a score on a molecular design problem that significantly outperforms existing work. Whereas existing studies on massively scalable parallel MCTS only compare the number of rollouts, we prove the practicality of the algorithm by comparing the quality of the solutions obtained in practice. This method is generic and is expected to speed up other applications of MCTS.


Single-Agent Optimization Through Policy Iteration Using Monte-Carlo Tree Search

arXiv.org Artificial Intelligence

The combination of Monte-Carlo Tree Search (MCTS) and deep reinforcement learning is state-of-the-art in two-player perfect-information games. In this paper, we describe a search algorithm that uses a variant of MCTS which we enhanced by 1) a novel action value normalization mechanism for games with potentially unbounded rewards (which is the case in many optimization problems), 2) defining a virtual loss function that enables effective search parallelization, and 3) a policy network, trained by generations of self-play, to guide the search. We gauge the effectiveness of our method in "SameGame"---a popular single-player test domain. Our experimental results indicate that our method outperforms baseline algorithms on several board sizes. Additionally, it is competitive with state-of-the-art search algorithms on a public set of positions.